This notebook contains all code required for Multiverse Meta-Analyses, including the generation of specifications, bootstrap data, and visualizations.
import numpy as np
import pandas as pd
from bootstrap import generate_boot_data
from config import read_config
from data import prepare_data
from plotting import (get_cluster_fill_data, get_spec_fill_data,
get_colors, plot_treemap, plot_multiverse,
plot_caterpillar, plot_sample_size, plot_cluster_size,
plot_spec_tiles, plot_cluster_tiles, plot_inferential,
plot_p_hist)
from specs import generate_specs
from user_data import preprocess_data
import plotly.io as pio
pio.renderers.default = "plotly_mimetype+notebook"
%load_ext autoreload
%autoreload 2
The interactive Dashboard can be launched from this notebook.
%run -i "./dashboard.py"
In this cell, set the title, the working directory and the path to the dataset for this analysis. The config, preprocessed data, specs, and bootstrap data paths depend on the working directory and the title. This naming convention can be changed, but the prefixes (i.e. boot, config, data and specs) are required for the Dashboard to work. The configuration file must exist, all other data can either be loaded or generated, using the boolean flags. The generated data will be stored at the specified paths, or loaded from that path.
TITLE = "R2D4D_3"
DIR = "../examples/R2D4D"
DATA_PATH = f"{DIR}/R2D4D.csv"
# TITLE = "Chernobyl_2"
# DIR = "../examples/Chernobyl"
# DATA_PATH = f"{DIR}/Chernobyl.rda"
# TITLE = "IandR_2"
# DIR = "../examples/IandR"
# DATA_PATH = f"{DIR}/iandr.sav"
PREPROCESS_DATA = True # Load of preprocess data
GENERATE_SPECS = True # Load or generate specs
GENERATE_BOOTDATA = True # Load or generate boot data
PP_DATA_PATH = f"{DIR}/data_{TITLE}.csv"
CONFIG_PATH = f"{DIR}/config_{TITLE}.json"
SPECS_PATH = f"{DIR}/specs_{TITLE}.csv"
BOOT_PATH = f"{DIR}/boot_{TITLE}.csv"
In this cell, the configuration file is processed. The cell prints out the parsed configuration, so the user can double-check if the result is as expected.
config = read_config(path=CONFIG_PATH)
if config is not None:
c_info = [
f"{config['level']} - Level Meta-Analysis",
f" Minimum Nr. of Samples to include Specification: {config['k_min']}",
f" Bootstrap Iterations: {config['n_boot_iter']}",
f" {config['n_which']} Which-Factors:",
*[f" {k} : {(', ').join(v)}" for k, v in config['which_lists'].items()],
f" {config['n_how']} How-Factors:",
*[f" {k} : {(', ').join(v)}" for k, v in config['how_lists'].items()],
f" Labels",
*[f" {l}" for l in config['labels']],
f" Column-Map",
*[f" {k} : {v}" for k, v in config['colmap'].items()]
]
print(("\n").join(c_info))
3 - Level Meta-Analysis
Minimum Nr. of Samples to include Specification: 2
Bootstrap Iterations: 100
6 Which-Factors:
sex : men, women, all_sex
method : direct, image, all_method
age_group : adults, non-adults, all_age_group
sample : healthy, clinical, all_sample
race : white, other, all_race
published_estimate : yes, no, all_published_estimate
3 How-Factors:
effect : z
ma_method : REML, ML
test : t-test, z-test
Labels
sex: male
sex: female
sex: either
measure: direct
measure: image
measure: either
age: adults
age: non-adults
age: either
group: healthy
group: patients
group: either
ethnicity: White
ethnicity: non-White
ethnicity: either
report: full
report: not
report: either
metric: z
model: REML
model: ML
test: t
test: z
Column-Map
key_c : Study_name
key_c_id : c_id
key_e_id : e_id
key_z : z
key_z_se : z_se
key_z_var : z_var
key_r : r
key_r_se : r_se
key_r_var : r_var
key_main_es : z
key_main_es_se : z_se
key_n : N
In this cell, the dataset is either preprocessed and stored at PP_DATA_PATH, or the preprocessed dataset is loaded from PP_DATA_PATH. The cell prints out the head and the dimensions of the data. If preprocessing is desired, the function preprocess_data() must be defined by the user, in the file user_data.R.
if PREPROCESS_DATA:
ma_data = preprocess_data(DATA_PATH, title=TITLE)
else:
ma_data = pd.read_csv(PP_DATA_PATH)
print(f"Data Shape: {ma_data.shape}")
ma_data.head()
Data Shape: (31, 16)
| Study_name | publ_yr | publ_yr_recoded | sex | age_group | sample | race | method | published_estimate | N | r | r_se | z | z_se | z_var | r_var | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | Manning (2003) | 2003 | 1 | men | adults | healthy | white | direct | yes | 50 | 0.2900 | 0.133598 | 0.298566 | 0.145865 | 0.021277 | 0.017848 |
| 1 | Latourelle (2008) | 2008 | 6 | men | adults | healthy | white | image | no | 35 | 0.0000 | 0.176777 | 0.000000 | 0.176777 | 0.031250 | 0.031250 |
| 2 | Latourelle (2008) | 2008 | 6 | women | adults | healthy | white | image | no | 72 | 0.0000 | 0.120386 | 0.000000 | 0.120386 | 0.014493 | 0.014493 |
| 3 | Mas (2009) | 2009 | 7 | men | adults | healthy | white | image | no | 72 | -0.0685 | 0.119821 | -0.068607 | 0.120386 | 0.014493 | 0.014357 |
| 4 | Mas (2009) | 2009 | 7 | men | adults | clinical | white | image | no | 63 | 0.0021 | 0.129099 | 0.002100 | 0.129099 | 0.016667 | 0.016667 |
In this cell, the preprocessed dataset is prepared for meta-analysis. Preparation adds cluster- and effect- IDs, sets datatypes, etc.. For details, consult the function documentation of prepareData(). The cell prints out the head and the dimensions of the prepared data.
data = prepare_data(config["colmap"], data=ma_data)
print(f"Data Shape: {data.shape}")
data.head()
Data Shape: (31, 18)
| c_id | Study_name | e_id | publ_yr | publ_yr_recoded | sex | age_group | sample | race | method | published_estimate | N | r | r_se | z | z_se | z_var | r_var | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 1 | Manning (2003) | 1 | 2003 | 1 | men | adults | healthy | white | direct | yes | 50 | 0.2900 | 0.133598 | 0.298566 | 0.145865 | 0.021277 | 0.017848 |
| 1 | 2 | Latourelle (2008) | 2 | 2008 | 6 | men | adults | healthy | white | image | no | 35 | 0.0000 | 0.176777 | 0.000000 | 0.176777 | 0.031250 | 0.031250 |
| 2 | 2 | Latourelle (2008) | 3 | 2008 | 6 | women | adults | healthy | white | image | no | 72 | 0.0000 | 0.120386 | 0.000000 | 0.120386 | 0.014493 | 0.014493 |
| 3 | 3 | Mas (2009) | 4 | 2009 | 7 | men | adults | healthy | white | image | no | 72 | -0.0685 | 0.119821 | -0.068607 | 0.120386 | 0.014493 | 0.014357 |
| 4 | 3 | Mas (2009) | 5 | 2009 | 7 | men | adults | clinical | white | image | no | 63 | 0.0021 | 0.129099 | 0.002100 | 0.129099 | 0.016667 | 0.016667 |
In this cell, the specifications are either generated and stored at SPECS_PATH, or loaded from SPECS_PATH. For details, consult the function documentation of generate_specs().
if GENERATE_SPECS:
specs = generate_specs(
data,
config["which_lists"],
config["how_lists"],
config["colmap"],
config["k_min"],
config["level"],
SPECS_PATH
)
else:
specs = pd.read_csv(SPECS_PATH)
print(specs.shape)
specs.head()
100%|██████████| 2916/2916 [00:39<00:00, 74.77it/s]
(340, 20)
| sex | method | age_group | sample | race | published_estimate | effect | ma_method | test | mean | lb | ub | p | k | set | set_es | kc | full_set | rank | ci | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 21 | men | image | adults | healthy | white | no | z | REML | z-test | -0.046836 | -0.237284 | 0.147079 | 0.637611 | 2 | 2,3 | 2,4 | 2 | 0 | 1 | 0.384363 |
| 20 | men | image | adults | healthy | white | no | z | REML | t-test | -0.046836 | -0.864575 | 0.838899 | 0.719751 | 2 | 2,3 | 2,4 | 2 | 0 | 2 | 1.703474 |
| 22 | men | image | adults | healthy | white | no | z | ML | t-test | -0.046836 | -0.864575 | 0.838899 | 0.719751 | 2 | 2,3 | 2,4 | 2 | 0 | 3 | 1.703474 |
| 23 | men | image | adults | healthy | white | no | z | ML | z-test | -0.046836 | -0.237284 | 0.147079 | 0.637611 | 2 | 2,3 | 2,4 | 2 | 0 | 4 | 0.384363 |
| 43 | men | image | adults | all_sample | white | no | z | ML | z-test | -0.028613 | -0.181069 | 0.125186 | 0.716490 | 3 | 2,3 | 2,4,5 | 2 | 0 | 5 | 0.306255 |
In this cell, the bootstrap data is either generated and stored at BOOT_PATH, or loaded from BOOT_PATH. For details, consult the function documentation of generate_boot_data().
if GENERATE_BOOTDATA:
boot_data = generate_boot_data(
specs,
config["n_boot_iter"],
data,
config["colmap"],
config["level"],
BOOT_PATH
)
else:
boot_data = pd.read_csv(BOOT_PATH)
print(boot_data.shape)
boot_data.head()
100%|██████████| 100/100 [05:35<00:00, 3.36s/it]
(340, 4)
| rank | obs | boot_lb | boot_ub | |
|---|---|---|---|---|
| 21 | 1 | -0.046836 | -0.201379 | -0.028634 |
| 20 | 2 | -0.046836 | -0.201379 | -0.028634 |
| 22 | 3 | -0.046836 | -0.200635 | -0.028457 |
| 23 | 4 | -0.046836 | -0.200635 | -0.028457 |
| 43 | 5 | -0.028613 | -0.168904 | -0.024644 |
In this cell, the cluster- and specification- fill data for the respective tile maps is prepared, as well as the list of colors that constitute the color scheme. For details, consult the respective function documentation.
cluster_fill_data = get_cluster_fill_data(
data,
specs,
config["colmap"]
)
spec_fill_data = get_spec_fill_data(
config["n_which"],
config["which_lists"],
config["n_how"],
config["how_lists"],
specs
)
fill_levels = len(np.unique([v for v in spec_fill_data.values()]))
colors = get_colors(fill_levels)
Here we define important variables for plotting that will be reused in several plots, to improve readability.
colmap = config["colmap"]
k_range = [config["k_min"], max(specs["k"])]
labels = config["labels"]
level = config["level"]
n_total_specs = len(specs)
title = config["title"]
Treemap of the meta-analytic dataset. It visualizes each study and the reported effect size, with the colors indicating the size of the study sample size N (hot colors for low, cold colors for high sample sizes). If studies report multiple effect sizes, the size of each study's tile corresponds to the amount of reported effect sizes. The tile's color indicates the average sample size of the reported effects.
treemap = plot_treemap(data, title, colmap)
treemap.show()
fig_inferential = plot_inferential(boot_data, title, n_total_specs)
fig_inferential.show()
fig_p_hist = plot_p_hist(specs, title, n_total_specs)
fig_p_hist.show()
fig = plot_multiverse(
specs,
n_total_specs,
k_range,
cluster_fill_data,
spec_fill_data,
labels,
colors,
config["level"],
title,
fill_levels
)
fig.show()
# fig.write_image("multiverse.pdf")
# fig.write_image("multiverse.pdf", width=1000, height=1500)
fig_cluster_tiles = plot_cluster_tiles(specs, cluster_fill_data, n_total_specs, title)
fig_cluster_tiles.show()
fig_caterpillar = plot_caterpillar(specs, n_total_specs, colors, k_range, title, fill_levels)
fig_caterpillar.show()
fig_cluster_size = plot_cluster_size(specs, k_range, n_total_specs, title)
fig_cluster_size.show()
fig_sample_size = plot_sample_size(specs, k_range, n_total_specs, title)
fig_sample_size.show()
fig_spec_tiles = plot_spec_tiles(specs, n_total_specs, spec_fill_data, labels, colors, k_range, title, fill_levels)
fig_spec_tiles.show()